CS1.503 Mathematical Foundations of Data Science (Monsoon 2023)
Announcements
- Instructors: Girish Varma, Suryajith Chillara
- Teaching assistants: Nithish Raja
- Schedule: Lectures: Tuesday and Friday, 14:00 – 15:25, Tutorial: Saturday, 12:40 - 13:40
- Classroom: Digital classroom, B5 Vindhya
- Course portal: elearn @ IIIT Hyderabad
Tentative topics
-
Estimation from Random Samples
- Examples in Vote Share surveys, Medical Tests etc.
- Mathematical Formulation in terms of Random Variables
- Linearity of Expectation & Tail Bounds
- Gaussian & Confidence Intervals
- Predictions, Precision/Recall Curve
-
WWW Graph, Page Rank & Eigenvalues
- World Wide Web Graph and Ranking Problem
- Random Walks and Eigenvalues
- Stationery Distributions and Degree
- Convergence and Second Largest Eigen Value
-
Dimensionality Reduction
- Dimension Reduction Problem
- Examples: Axis of a Spiral Galaxy, Recommender Systems
- Singular Value Decomposition
- Best fit subspaces from SVD
- Low Rank Assumption and Applications
- Projection to Random Subspace (Johnson-Lindenstrauss)
-
Data Streaming Algorithms
- Finding distinct elements, missing numbers and duplicates
- Streaming algorithms
- Fingerprinting Method
- Frequency Moments and k-wise Independence
- Limits of Streaming Algorithms
-
Nearest Neighbor Search, Hashing and Clustering
- Nearest Neighbor Classifier
- Hashing
- Appropriate NN from Locally Sensitive Hashing
- Clustering
-
Sublinear time algorithms
- Property testing
- Sublinear time algorithms for graphs
- Sublinear time algorithms for boolean functions
- Distribution testing
-
Supervised Learning
- PAC learning
- Infinite hypothesis classes and VC dimension
- Learning linear functions using gradient updates
References
- Avrim Blum, John Hopcroft, and Ravindran Kannan (2018), Foundations of Data Science, Free draft on web.
- Martin J. Wainwright and Michael I. Jordan (2018), Graphical Models, Exponential Families, and Variational Inference, Now publishers.
- S. Muthukrishnan (2004), Data Streams: Algorithms and Applications, survey.
- Moran Feldman (2020), Algorithms for Big Data, World Scientific.
- Shai Shalev-Schwartz and Shai Ben David (2014), Understanding Machine Learning: From theory to Algorithms, Cambridge University Press.